library(xts)
library(quantmod)
library(ggthemes)
library(dygraphs)
library(tidyverse)
library(urca)
library(tseries)
library(forecast)
library(dplyr)Next-Day Bitcoin Price Forecast
Required libraries
Read bitcoin csv daily price
quotes_bitcoin <- read_csv("../data/Bitcoindata.csv",
col_select = c(Date,Close))We can examine structure of the resulting object:
head(quotes_bitcoin)# A tibble: 6 × 2
timeOpen close
<dttm> <dbl>
1 2018-10-03 00:00:00 6503.
2 2018-10-02 00:00:00 6556.
3 2018-10-01 00:00:00 6590.
4 2018-09-30 00:00:00 6626.
5 2018-09-29 00:00:00 6602.
6 2018-09-28 00:00:00 6644.
tail(quotes_bitcoin)# A tibble: 6 × 2
timeOpen close
<dttm> <dbl>
1 2012-01-06 00:00:00 6.60
2 2012-01-05 00:00:00 6.67
3 2012-01-04 00:00:00 5.55
4 2012-01-03 00:00:00 4.90
5 2012-01-02 00:00:00 5.22
6 2012-01-01 00:00:00 5.13
glimpse(quotes_bitcoin)Rows: 2,468
Columns: 2
$ timeOpen <dttm> 2018-10-03, 2018-10-02, 2018-10-01, 2018-09-30, 2018-09-29, …
$ close <dbl> 6502.59, 6556.10, 6589.62, 6625.56, 6601.96, 6644.13, 6676.75…
Let’s also check the class of the Date column:
class(quotes_bitcoin$Close)[1] "numeric"
lets check structure of the whole dataset
str(quotes_bitcoin)tibble [2,468 × 2] (S3: tbl_df/tbl/data.frame)
$ timeOpen: POSIXct[1:2468], format: "2018-10-03" "2018-10-02" ...
$ close : num [1:2468] 6503 6556 6590 6626 6602 ...
- attr(*, "spec")=
.. cols(
.. timeOpen = col_datetime(format = ""),
.. timeClose = col_skip(),
.. timeHigh = col_skip(),
.. timeLow = col_skip(),
.. name = col_skip(),
.. open = col_skip(),
.. high = col_skip(),
.. low = col_skip(),
.. close = col_double(),
.. volume = col_skip(),
.. marketCap = col_skip(),
.. timestamp = col_skip()
.. )
##Let’s transform column ‘Date’ into type date:
quotes_bitcoin$Date <- as.Date(quotes_bitcoin$Date, format = "%d/%m/%Y")We have to give the format in which date is originally stored: * %y means 2-digit year, * %Y means 4-digit year * %m means a month * %d means a day
class(quotes_bitcoin$Date)[1] "Date"
head(quotes_bitcoin)# A tibble: 6 × 2
Date Close
<date> <dbl>
1 2018-10-04 6548.
2 2018-10-03 6457.
3 2018-10-02 6500
4 2018-10-01 6571.
5 2018-09-30 6598.
6 2018-09-29 6579.
glimpse(quotes_bitcoin)Rows: 2,466
Columns: 2
$ Date <date> 2018-10-04, 2018-10-03, 2018-10-02, 2018-10-01, 2018-09-30, 201…
$ Close <dbl> 6547.56, 6456.77, 6500.00, 6571.20, 6597.81, 6579.38, 6610.76, 6…
Now R understands this column as dates
Creating xts objects
quotes_bitcoin <-
xts(quotes_bitcoin[, -1], # data columns (without the first column with date)
quotes_bitcoin$Date) # date/time indexLets see the result:
head(quotes_bitcoin) close
2012-01-01 5.132450
2012-01-02 5.218210
2012-01-03 4.898447
2012-01-04 5.546638
2012-01-05 6.671950
2012-01-06 6.602055
str(quotes_bitcoin)An xts object on 2012-01-01 / 2018-10-03 containing:
Data: double [2468, 1]
Columns: close
Index: Date [2468] (TZ: "UTC")
Finally, let’s use the ggplot2 package to produce nice visualization.
The ggplot2 package expects data to be in long format, rather than wide format.
Hence, first we have to convert the tibble to a long tibble:
Plotting Actual Bitcoin Price
tibble(df = quotes_bitcoin) %>%
ggplot(aes(zoo::index(quotes_bitcoin), df)) +
geom_line() +
theme_bw() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Actual Bitcoin Price",
subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
caption = "source: RR 2024",
x="",
y=""
)Plotting Log Transformed Bitcoin Price
tibble(df = quotes_bitcoin) %>%
ggplot(aes(zoo::index(quotes_bitcoin), log(quotes_bitcoin))) +
geom_line() +
theme_bw() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "Log Transformed Bitcoin Price",
subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
caption = "source: RR 2024",
x="",
y=""
)Plotting 1st Difference Log Operator
tibble(df = quotes_bitcoin) %>%
ggplot(aes(zoo::index(quotes_bitcoin), periodReturn(quotes_bitcoin, period="daily", type="log"))) +
geom_line() +
theme_bw() +
scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
labs(
title = "1st Difference Log Operator",
subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
caption = "source: RR 2024",
x="",
y=""
)#Table 1. Stationary test of data.
First in-sample window (500 days)
| Data | Training_Sample | ADF_Test | PP_Test |
|---|---|---|---|
| Original data | 01/01/2012~14/05/2013 | -1.849 ( 0.642 ) | -12.235 ( 0.427 ) |
| Log transformed data | 01/01/2012~14/05/2013 | -1.521 ( 0.781 ) | -3.828 ( 0.896 ) |
| 1st difference log operator | 01/01/2012~14/05/2013 | -9.743 ( 0.010 ) | -497.980 ( 0.010 ) |
Second in-sample window (2000 days)
| Data | Training_Sample | ADF_Test | PP_Test |
|---|---|---|---|
| Original data | 01/01/2012~25/06/2017 | 0.617 ( 0.990 ) | 5.162 ( 0.990 ) |
| Log transformed data | 01/01/2012~25/06/2017 | -1.378 ( 0.842 ) | -3.367 ( 0.918 ) |
| 1st difference log operator | 01/01/2012~25/06/2017 | -11.478 ( 0.010 ) | -2103.646 ( 0.010 ) |
ADF. Augmented Dicky-Fuller test; PP. Phillips-Perron test. p-values in parenthesis, p-value less than 0.05 confirms stationary
#Table 2. Training-sample forecast performance.
First training-sample window (500 days)
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA (4,1,0) | 01/01/2012~14/05/2013 | 0.063 | 1.317 | 0.033 |
| NNAR (2,1) | 01/01/2012~14/05/2013 | 0.058 | 1.264 | 0.032 |
Second training-sample window (2000 days)
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA (4,1,1) | 01/01/2012~25/06/2017 | 0.048 | 0.645 | 0.027 |
| NNAR (1,2) | 01/01/2012~25/06/2017 | 0.048 | 0.641 | 0.027 |
#(a) Actual and forecasted Bitcoin price (training sample:500 days, test-sample:1966 days)
#(b) Concentrated view on the forecast period (test-sample:1966 days)
#(c) Actual and forecasted Bitcoin price (training sample:2000 days, test-sample:466 days)
#(d) Concentrated view on the forecast period (test-sample:466 days)
Table 3. Test-sample static forecast performance.
First test-sample window (1966 days) Forecast without re-estimation at each step
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA (4,1,0) | 15/05/2013~04/10/2018 | 0.373 | 2.924 | 0.230 |
| NNAR (2,1) | 15/05/2013~04/10/2018 | 0.042 | 0.357 | 0.024 |
Forecast with re-estimation at each step
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA | 15/05/2013~04/10/2018 | 0.312 | 2.668 | 0.205 |
| NNAR | 15/05/2013~04/10/2018 | 0.050 | 0.425 | 0.029 |
Second test-sample window (466 days) Forecast without re-estimation at each step
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA (4,1,1) | 26/06/2017~04/10/2018 | 0.026 | 0.098 | 0.009 |
| NNAR (1,2) | 26/06/2017~04/10/2018 | 0.022 | 0.078 | 0.007 |
Forecast with re-estimation at each step
| Forecast_Model | Training_Sample | RMSE | MAPE | MAE |
|---|---|---|---|---|
| ARIMA (4,1,1) | 26/06/2017~04/10/2018 | 0.026 | 0.097 | 0.009 |
| NNAR (1,2) | 26/06/2017~04/10/2018 | 0.031 | 0.106 | 0.009 |
Table 4. DM test of forecast results.
#First test-sample window (1966 days)
| Models_Compared | DM_Statistics | p_Value | |
|---|---|---|---|
| DM | ARIMA vs. NNAR (re-estimation) | -37.724 | 3.062208e-246 |
| DM1 | ARIMA vs. NNAR (without re-estimation) | -34.225 | 2.223566e-210 |
| DM2 | ARIMA (re-estimation) vs. ARIMA (without re-estimation) | 18.317 | 2.281731e-70 |
| DM3 | NNAR (re-estimation) vs. NNAR (without re-estimation) | -18.115 | 5.935986e-69 |
#Second test-sample window (466 days)
| Models_Compared | DM_Statistics | p_Value | |
|---|---|---|---|
| DM | ARIMA vs. NNAR (re-estimation) | 1.036 | 3.004223e-01 |
| DM1 | ARIMA vs. NNAR (without re-estimation) | -19.023 | 2.136618e-75 |
| DM2 | ARIMA (re-estimation) vs. ARIMA (without re-estimation) | 6.177 | 7.611747e-10 |
| DM3 | NNAR (re-estimation) vs. NNAR (without re-estimation) | -13.003 | 1.943571e-37 |
p < 0.05 indicates that forecast results of the first method is better than the second method.
#Ljung-Box testing for used ARIMA models
Box-Pierce test
data: et410
X-squared = 27.863, df = 4, p-value = 0.00001329
Box-Pierce test
data: et411
X-squared = 27.005, df = 3, p-value = 0.000005873
#Proposed improved solution for ARIMA models (6,1,1) for 500 training data set
Box-Pierce test
data: et611
X-squared = 5.5026, df = 3, p-value = 0.1385
ME RMSE MAE MPE MAPE MASE
Training set 0.006948153 0.06167435 0.03395994 0.2132751 1.364582 1.020184
ACF1
Training set -0.02906356
#Proposed improved solution for ARIMA models (6,1,1) for 500 training data set
Box-Pierce test
data: et510
X-squared = 3.942, df = 2, p-value = 0.1393
ME RMSE MAE MPE MAPE MASE
Training set 0.002875649 0.04777167 0.02727576 0.06537294 0.6468885 0.9993979
ACF1
Training set -0.007808208